Experiments on the automatic induction of German semantic verb classes
نویسنده
چکیده
This article presents clustering experiments on German verbs: A statistical grammar model for German serves as the source for a distributional verb description at the lexical syntax–semantics interface, and the unsupervised clustering algorithm k-means uses the empirical verb properties to perform an automatic induction of verb classes. Various evaluation measures are applied to compare the clustering results to gold standard German semantic verb classes under different criteria. The primary goals of the experiments are (1) to empirically utilize and investigate the well-established relationship between verb meaning and verb behavior within a cluster analysis and (2) to investigate the required technical parameters of a cluster analysis with respect to this specific linguistic task. The clustering methodology is developed on a small-scale verb set and then applied to a larger-scale verb set including 883 German verbs.
منابع مشابه
Experiments on the Automatic Induction of German Semantic Verb Classes
This article presents clustering experiments on German verbs: A statistical grammar model for German serves as the source for a distributional verb description at the lexical syntax–semantics interface, and the unsupervised clustering algorithm k-means uses the empirical verb properties to perform an automatic induction of verb classes. Various evaluation measures are applied to compare the clu...
متن کاملInducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information
The paper describes the application of kMeans, a standard clustering technique, to the task of inducing semantic classes for German verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 57 verbs into 14 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A ...
متن کاملLatent Semantic Clustering of German Verbs with Treebank Data
Treebank data have been utilized as data sources for a wide range of tasks in computational linguistics, including statistical parsing, anaphora resolution, induction of valence lexica, etc. More recently, researchers have experimented with extracting semantic information from syntactically annotated data. Here, treebank data have been used for the purposes of identifying selectional preference...
متن کاملThe Representation of German Prepositional Verbs in a Semantically Based Computer Lexicon
We describe the treatment of verbs with prepositional complements in HaGenLex, a semantically based computer lexicon for German. Prepositional verbs such as bestehen auf (‘insist on’) subcategorize for a prepositional phrase where the preposition usually has no independent meaning of its own. The lexical semantic information in HaGenLex is specified by means of MultiNet, a full-fledged knowledg...
متن کاملAutomatic Induction of German Aspectual Verb Classes in a Distributional Framework
The central question of this study is whether aspectual verb classes (Vendler, 1967) can be induced from corpus data in a fully automatic, distributionally motivated procedure. We propose an operationalization of ‘aspectivity’ utilizing distributional information about nominal fillers in the argument positions of verbs in combination with aspectual features automatically derived from dependency...
متن کامل